Experiments with SVM and Stratified Sampling with an Imbalanced Problem: Detection of Intestinal Contractions
نویسندگان
چکیده
In this paper we show some preliminary results of our research in the fieldwork of classification of imbalanced datasets with SVM and stratified sampling. Our main goal is to deal with the clinical problem of automatic intestinal contractions detection in endoscopic video images. The prevalence of contractions is very low, and this yields to highly skewed training sets. Stratified sampling together with SVM has been reported in the literature to behave well in this kind of problems. We applied both the SOMOTE algorithm developed by Chawla et al. and under-sampling, in a cascade system implementation to deal with the skewed training sets in the final SVM classifier. We show comparative results for both sampling techniques using precision-recall, which appear to be useful tools for performance testing.
منابع مشابه
Enhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...
متن کاملImbalanced Data SVM Classification Method Based on Cluster Boundary Sampling and DT-KNN Pruning
This paper presents a SVM classification method based on cluster boundary sampling and sample pruning. We actively explore an effective solution to solve the difficult problem of imbalanced data set classification from data re-sampling and algorithm improving. Firstly, we creatively propose the method of cluster boundary sampling, using the clustering density threshold and the boundary density ...
متن کاملA Selective Sampling Method for Imbalanced Data Learning on Support Vector Machines
The class imbalance problem in classification has been recognized as a significant research problem in recent years and a number of methods have been introduced to improve classification results. Rebalancing class distributions (such as over-sampling or under-sampling of learning datasets) has been popular due to its ease of implementation and relatively good performance. For the Support Vector...
متن کاملارائه یک روش فازی-تکاملی برای تشخیص خطاهای نرمافزار
Software defects detection is one of the most important challenges of software development and it is the most prohibitive process in software development. The early detection of fault-prone modules helps software project managers to allocate the limited cost, time, and effort of developers for testing the defect-prone modules more intensively. In this paper, according to the importance of soft...
متن کاملSVM Classification for High-dimensional Imbalanced Data based on SNR and Under-sampling
Support vector machine (SVM) is biased towards the majority class, in some case dataset is class-imbalanced and the bias is even larger for high-dimensional. In order to improve the classification accuracy of SVM on high-dimensional imbalanced data, we combine signal-noise ratio (SNR) and under-sampling technique based on K-means. In this article firstly we apply SNR into feature selection to r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005